Sync dotnet CsvFileReader with Python multiline-field fix#2
Merged
sharpninja merged 2 commits intomainfrom Feb 28, 2026
Merged
Conversation
…2248) Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Sync dotnet code with python changes since last commit
Sync dotnet CsvFileReader with Python multiline-field fix
Feb 28, 2026
There was a problem hiding this comment.
Pull request overview
Syncs the .NET CsvFileReader behavior with the Python multiline-CSV fix by switching from line-by-line parsing to a full-content parser so quoted fields can safely contain embedded newlines.
Changes:
- Replaced the line-based CSV parsing loop with a character-by-character
ParseCsvContentimplementation to support quoted multiline fields and escaped quotes. - Added guards for empty headers / trailing content and skipped simple blank lines.
- Added a new unit test suite for CSV parsing scenarios (multiline fields, commas-in-quotes, escaped quotes, empty inputs, no matches).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| dotnet/src/GraphRag.Input/CsvFileReader.cs | Implements full-content CSV parsing to preserve multiline quoted fields and improve handling of quotes/commas. |
| dotnet/tests/GraphRag.Tests.Unit/Input/CsvFileReaderTests.cs | Adds unit coverage for CSV parsing, including multiline quoted-field preservation. |
Comments suppressed due to low confidence (1)
dotnet/src/GraphRag.Input/CsvFileReader.cs:137
- ParseCsvContent adds a row on every \r/\n boundary without checking whether the row is entirely empty. This means a trailing row like ",," (or any all-empty row terminated by a newline) will be kept and later turned into an empty TextDocument, even though the method’s final flush explicitly skips all-empty trailing rows. Consider skipping row creation when all collected fields are empty/whitespace (e.g., check fields.TrueForAll(string.IsNullOrWhiteSpace) before rows.Add), and add a unit test covering a trailing all-empty row with multiple columns.
else if (c == '\r' || c == '\n')
{
// End of row — consume \r\n as a single line ending.
fields.Add(field.ToString());
field.Clear();
rows.Add([.. fields]);
fields.Clear();
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Syncs the .NET codebase with Python changes merged since the last dotnet commit (Feb 23). The primary functional gap was
CsvFileReadersilently corrupting multiline quoted CSV fields by parsing line-by-line — matching the Python fix in graphrag-input microsoft#2248.Related Issues
Syncs with upstream Python commits:
64c5552— fix csv file reader (fix csv file reader microsoft/graphrag#2248)6f26d0e— vector load_documents in batches (vector load_documents in batches microsoft/graphrag#2251) — dotnet already alignedProposed Changes
dotnet/src/GraphRag.Input/CsvFileReader.csReadLineAsyncloop + single-lineParseCsvLinewith a full-contentParseCsvContentcharacter-by-character parser\n/\r\nin quoted fields,""escape sequences, commas inside quoted valuesBefore, this CSV would silently produce wrong output:
After,
docs[0].Text == "Line one.\nLine two.\nLine three."— matching Pythoncsv.DictReaderbehaviour.dotnet/tests/GraphRag.Tests.Unit/Input/CsvFileReaderTests.cs(new)test_csv_loader_preserves_multiline_fields), comma-inside-quote, escaped double-quote, empty content, no-matchChecklist
Additional Notes
The vector store batching changes from microsoft#2251 (LanceDB, Azure AI Search, CosmosDB) required no dotnet changes — the existing implementations already use equivalent batch semantics. LanceDB remains a
NotImplementedExceptionstub pending an official .NET SDK.💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.